Word Formation in Computational Linguistics

نویسندگان

Pius ten Hacken

Anke Lüdeling

چکیده

Where does word formation information play a role in computational linguistics? Think of information retrieval where the parts of a word might contain the important information. If we want to find information in German on relaxation techniques, we need to look for the verb entspannen 'to relax' as well as for the derived noun Entspannung 'relaxation' and compounds containing Entspannung, such as Entspannungsantwort 'relaxation response', Tiefenentspannung 'deep relaxation' or Entspannungsübung 'relaxation exercise' etc. Another example are Text-To-Speech systems where the structure of a word can tell us where the stress goes. Consider English words containing so-called neoclassical affixes (affixes of Latin or Greek etymology). Some of these affixes, such as –ation influence the stress of a word: re'lax vs. relax'ation. Even such seemingly 'basic' components as part-of-speech taggers can (and often do) use word formation information, if only as heuristics. If a tagger encounters an unknown English word ending in the letters , for example, it can guess that this must be an adjective. Word formation information is thus important for many computational linguistics applications. But why do we need a word formation component? There are large machine-readable lexicons available today. Why can't we just use these? First, we have to distinguish between computational linguistic applications (by application we mean 'higher' systems like machine translation systems, text understanding systems etc. that include a number of components) that deal with a fixed text/set of texts (which can, of course, work with a finite lexicon) and applications that deal with unseen text. These applications make use of basic components such as taggers, lemmatizers / stemmers, parsers etc. Often these components (and accordingly the application) – no matter

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-Forming Process in Azeri Turkish Language

The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...

متن کامل

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

Improving the lexical coverage of English compound adjectives in syntactic parsing

The present paper addresses the question how in syntactic parsing the coverage of words in previously unseen text may be improved. The adjectives in English are presented here as a case study. Working on the assumption that most new words that are introduced into the language are constructed on the basis of already existing words through the application of word-formation processes, we investiga...

متن کامل

Improving the lexical coverage of English compound adjectives

متن کامل

Last Words: Natural Language Processing and Linguistic Fieldwork

March 2009 marked an important milestone: the First International Conference on Language Documentation and Conservation, held at the University of Hawai‘i.1 The scale of the event was striking, with five parallel tracks running over three days. The organizers coped magnificently with three times the expected participation (over 300). The buzz among the participants was that we were at the start...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Word Formation in Computational Linguistics

نویسندگان

چکیده

منابع مشابه

Word-Forming Process in Azeri Turkish Language

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

Improving the lexical coverage of English compound adjectives in syntactic parsing

Improving the lexical coverage of English compound adjectives

Last Words: Natural Language Processing and Linguistic Fieldwork

عنوان ژورنال:

اشتراک گذاری